| [1] 韩瑞泽,冯伟,郭青, 等.视频单目标跟踪研究进展综述. 计算机学报, 2022, 45(9): 1877-1907.
(HAN R Z, FENG W, GUO Q, et al. Single Object Tracking Research: A Survey. Chinese Journal of Computers, 2022, 45(9): 1877-1907.)
[2] 田永林,王雨桐,王建功,等.视觉Transformer研究的关键问题:现状及展望.自动化学报, 2022, 48(4): 957-979.
(TIAN Y L, WANG Y T, WANG J G, et al. Key Problems and Progress of Vision Transformers: The State of the Art and Prospects. Acta Automatica Sinica, 2022, 48(4): 957-979.)
[3] 张天路,张强.基于深度学习的RGB-T目标跟踪技术综述.模式识别与人工智能, 2023, 36(4): 327-353.
(ZHANG T L, ZHANG Q.A Survey of RGB-T Object Tracking Technologies Based on Deep Learning. Pattern Recognition and Artificial Intelligence, 2023, 36(4): 327-353.)
[4] JIAO L C, ZHANG X, LIU X, et al. Transformer Meets Remote Sensing Video Detection and Tracking: A Comprehensive Survey. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 1-45.
[5] KUGARAJEEVAN J, KOKUL T, RAMANAN A, et al. Transformers in Single Object Tracking: An Experimental Survey. IEEE Access, 2023, 11: 80297-80326.
[6] 闵志方,杜虎,朱雪琼,等.单目标跟踪研究综述.光学与光电技术, 2023, 21(4): 1-14.
(MIN Z F, DU H, ZHU X Q, et al. Survey of Single Target Trac-king Research. Optics & Optoelectronic Technology, 2023, 21(4): 1-14.)
[7] 孙子文,钱立志,杨传栋,等.基于Transformer的视觉目标跟踪方法综述.计算机应用, 2024, 44(5): 1644-1654.
(SUN Z W, QIAN L Z, YANG C D, et al. Survey of Visual Object Tracking Methods Based on Transformer. Journal of Computer Applications, 2024, 44(5): 1644-1654.)
[8] 陈泷,石磊,黎智辉,等.基于深度学习的无人机单目标跟踪综述.计算机科学与探索, 2026, 20(1): 40-65.
(CHEN L, SHI L, LI Z H, et al. Survey of Deep Learning-Based UAV Single Object Tracking. Journal of Frontiers of Computer Science and Technology, 2026, 20(1): 40-65.)
[9] BOLME D S, BEVERIDGE J R, DRAPER B A, et al. Visual Object Tracking Using Adaptive Correlation Filters // Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2010: 2544-2550.
[10] LI Y, ZHU J K. A Scale Adaptive Kernel Correlation Filter Trac-ker with Feature Integration // Proc of the 13th European Confe-rence on Computer Vision. Berlin, Germany:Springer, 2014, II: 254-265.
[11] DANELLJAN M, HÄGER G, KHAN F S, et al. Accurate scale estimation for robust visual tracking[C/OL].[2025-11-10]. https://www.cvl.isy.liu.se/research/objrec/visualtracking/scalvistrack/ScaleTracking_BMVC14.pdf.
[12] HENRIQUES J F, CASEIRO R, MARTINS P, et al. High-Speed Tracking with Kernelized Correlation Filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 583-596.
[13] DANELLJAN M, ROBINSON A, KHAN F S, et al. Beyond Co-rrelation Filters: Learning Continuous Convolution Operators for Visual Tracking // Proc of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 472-488.
[14] DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient Convolution Operators for Tracking // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2017: 6931-6939.
[15] HUANG C, LUCEY S, RAMANAN D. Learning Policies for Adap-tive Tracking with Deep Feature Cascades // Proc of the IEEE International Conference on Computer Vision. Washington, USA: IEEE, 2017: 105-114.
[16] NAM H, HAN B. Learning Multi-domain Convolutional Neural Net-works for Visual Tracking // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2016: 4293-4302.
[17] BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-Convolutional Siamese Networks for Object Tracking // Proc of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 850-865.
[18] ZHU Z, WANG Q, LI B, et al. Distractor-Aware Siamese Networks for Visual Object Tracking // Proc of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 103-119.
[19] WANG Q, ZHANG L, BERTINETTO L, et al. Fast Online Object Tracking and Segmentation: A Unifying Approach // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 1328-1338.
[20] XU Y D, WANG Z Y, LI Z X, et al. SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12549-12556.
[21] ZHANG D W, FU Y W, ZHENG Z L. UAST: Uncertainty-Aware Siamese Tracking // Proc of the 39th International Conference on Machine Learning. San Diego, USA: JMLR, 2022: 26161-26175.
[22] GUO D Y, SHAO Y Y, CUI Y, et al. Graph Attention Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pa-ttern Recognition. Washington, USA: IEEE, 2021: 9538-9547.
[23] DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate Tracking by Overlap Maximization // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 4655-4664.
[24] BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning Discriminative Model Prediction for Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2019: 6181-6190.
[25] VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need[C/OL].[2025-11-10]. https://arxiv.org/pdf/1706.03762.
[26] WANG N, ZHOU W G, WANG J, et al. Transformer Meets Trac-ker: Exploiting Temporal Context for Robust Visual Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 1571-1580.
[27] CHEN X, YAN B, ZHU J W, et al. Transformer Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2021: 8122-8131.
[28] YAN B, PENG H W, FU J L, et al. Learning Spatio-Temporal Transformer for Visual Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 10428-10437.
[29] XIE F, WANG C Y, WANG G T, et al. Learning Tracking Representations via Dual-Branch Fully Transformer Networks // Proc of the IEEE/CVF International Conference on Computer Vision Workshops. Washington, USA: IEEE, 2021: 2688-2697.
[30] LIN L T, FAN H, ZHANG Z P, et al. SwinTrack: A Simple and Strong Baseline for Transformer Tracking // Proc of the 36th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2022: 16743-16754.
[31] YE B T, CHANG H, MA B P, et al. Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework // Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 341-357.
[32] CHEN B Y, LI P X, BAI L, et al. Backbone Is All Your Need: A Simplified Architecture for Visual Object Tracking // Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 375-392.
[33] RADFORD A, NARASIMHAN K, SALIMAN S T, et al. Improving Language Understanding by Generative Pre-training[C/OL].[2025-11-10]. https://gwern.net/doc/www/s3-us-west-2.amazonaws.com/d73fdc5ffa8627bce44dcda2fc012da638ffb158.pdf.
[34] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(Long and Short Papers). Stroudsburg, USA: ACL, 2019: 4171-4186.
[35] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale[C/OL].[2025-11-10]. https://arxiv.org/pdf/2010.11929.
[36] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9992-10002.
[37] CARION N, MASSA F, SYNNAEVE G, et al. End-to-End Object Detection with Transformers // Proc of the 16th European Confe-rence on Computer Vision. Berlin, Germany: Springer, 2020: 213-229.
[38] ZHU X Z, SU W J, LU L W, et al. Deformable DETR: Defor-mable Transformers for End-to-End Object Detection[C/OL].[2025-11-10]. https://arxiv.org/pdf/2010.04159.
[39] ZHENG S X, LU J C, ZHAO H S, et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 6877-6886.
[40] WANG W H, XIE E Z, LI X, et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 548-558.
[41] WU Y, LIM J, YANG M H. Object Tracking Benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848.
[42] MUELLER M, SMITH N, GHANEM B. A Benchmark and Simulator for UAV Tracking // Proc of the 14th European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 445-461.
[43] MUELLER M, BIBI A, GIANCOLA S, et al. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild // Proc of the 15th European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 310-327.
[44] FAN H, LIN L T, YANG F, et al. LaSOT: A High-Quality Ben-chmark for Large-Scale Single Object Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2019: 5369-5378.
[45] HUANG L H, ZHAO X, HUANG K Q. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577.
[46] PENG L, GAO J Y, LIU X R, et al. VastTrack: Vast Category Visual Object Tracking // Proc of the 38th International Confe-rence on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2024: 130797-130818.
[47] WANG X, SHU X J, ZHANG Z P, et al. Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 13758-13768.
[48] YAN S, YANG J Y, KÄPYLÄ J, et al. DepthTrack: Unveiling the Power of RGBD Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 10725-10733.
[49] WANG X, LI J N, ZHU L, et al. VisEvent: Reliable Object Tra-cking via Collaboration of Frame and Event Flows. IEEE Transactions on Cybernetics, 2024, 54(3): 1997-2010.
[50] LI C L, XUE W L, JIA Y Q, et al. LasHeR: A Large-Scale High-Diversity Benchmark for RGBT Tracking. IEEE Transactions on Image Processing, 2022, 31: 392-404.
[51] WANG Q, TENG Z, XING J L, et al. Learning Attentions: Resi-dual Attentional Siamese Network for High Performance Online Vi-sual Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2018: 4854-4863.
[52] YU Y C, XIONG Y L, HUANG W L, et al. Deformable Siamese Attention Networks for Visual Object Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2020: 6727-6736.
[53] FU Z H, LIU Q J, FU Z H, et al. STMTrack: Template-Free Vi-sual Tracking with Space-Time Memory Networks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2021: 13769-13778.
[54] YU B, TANG M, ZHENG L Y, et al. High-Performance Discriminative Tracking with Transformers // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 9836-9845
[55] MAYER C, DANELLJAN M, BHAT G, et al. Transforming Model Prediction for Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 8721-8730.
[56] CHEN X, YAN B, ZHU J W, et al. High-Performance Transfor-mer Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(7): 8507-8523.
[57] CAO Z A, HUANG Z Y, PAN L, et al. TCTrack: Temporal Contexts for Aerial Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 14778-14788.
[58] CAO Z A, HUANG Z Y, PAN L, et al. Towards Real-World Vi-sual Tracking with Temporal Contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(12): 15834-15849.
[59] CAO Z A, FU C H, YE J J, et al. HiFT: Hierarchical Feature Transformer for Aerial Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 15437-15446.
[60] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Cla-ssification with Deep Convolutional Neural Networks // Proc of the 26th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2012: 1106-1114.
[61] XING D T, EVANGELIOU N, TSOUKALAS A, et al. Siamese Transformer Pyramid Networks for Real-Time UAV Tracking // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2022: 1898-1907.
[62] GUO M Z, ZHANG Z P, FAN H, et al. Learning Target-Aware Representation for Visual Tracking via Informative Interactions // Proc of the 31st International Joint Conference on Artificial Intelligence. San Francisco, USA: IJCAI, 2022: 927-934.
[63] NI X Y, YUAN L, LÜ K. Efficient Single-Object Tracker Based on Local-Global Feature Fusion. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(2): 1114-1122.
[64] WANG Z A, LI M, PEI W J, et al. Exploring the Complementarity between Convolution and Transformer Matching for Visual Tra-cking. Knowledge-Based Systems, 2024, 300. DOI: 10.1016/j.knosys.2024.112184.
[65] XIONG J B, LING Q.Mask-Guided Siamese Tracking with a Frequency-Spatial Hybrid Network. IEEE Transactions on Circuits and Systems for Video Technology, 2025, 35(1): 103-117.
[66] SONG Z K, YU J Q, CHEN Y P, et al. Transformer Tracking with Cyclic Shifting Window Attention // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 8781-8790.
[67] GAO S Y, ZHOU C L, MA C, et al. AiATrack: Attention in Atten-tion for Transformer Visual Tracking // Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 146-164.
[68] FU Z H, FU Z H, LIU Q J, et al. SparseTT: Visual Tracking with Sparse Transformers // Proc of the 31st International Joint Confe-rence on Artificial Intelligence. San Francisco, USA: IJCAI, 2022: 905-912.
[69] LIANG Y, LI Q Q, LONG F M.Global Dilated Attention and Target Focusing Network for Robust Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(2): 1549-1557.
[70] MA F, SHOU M Z, ZHU L C, et al. Unified Transformer Tracker for Object Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 8771-8780.
[71] ZHOU Z K, CHEN J Q, PEI W J, et al. Global Tracking via Ensemble of Local Trackers // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 8751-8760.
[72] LIU T P, LI J, WU J, et al. Tracking with Saliency Region Transformer. IEEE Transactions on Image Processing, 2024, 33: 285-296.
[73] SUN X L, SUN H J, JIANG S, et al. Multi-attention Associate Prediction Network for Visual Tracking. Neurocomputing, 2025, 614. DOI: 10.1016/j.neucom.2024.128785.
[74] HE K J, ZHANG C L, XIE S, et al. Target-Aware Tracking with Long-Term Context Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(1): 773-780.
[75] TANG C M, WANG X, BAI Y C, et al. Learning Spatial-Frequency Transformer for Visual Object Tracking. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(9): 5102-5116.
[76] TANG C M, HU Q T, ZHOU G F, et al. Transformer Sub-Patch Matching for High-Performance Visual Object Tracking. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(8): 8121-8135.
[77] CHEN L K, GAO L, JIANG Y, et al. Local-Global Self-Attention for Transformer-Based Object Tracking. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(12): 12316-12329.
[78] CUI Y T, JIANG C, WANG L M, et al. MixFormer: End-to-End Tracking with Iterative Mixed Attention // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 13598-13608.
[79] WU H P, XIAO B, CODELLA N, et al. CvT: Introducing Convolutions to Vision Transformers // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2021: 22-31.
[80] XIE F, WANG C Y, WANG G T, et al. Correlation-Aware Deep Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 8741-8750.
[81] XIE F, YANG W K, WANG C Y, et al. Correlation-Embedded Transformer Tracking: A Single-Branch Framework. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 10681-10696.
[82] HE K M, CHEN X L, XIE S N, et al. Masked Autoencoders Are Scalable Vision Learners // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2022: 15979-15988.
[83] LAN J P, CHENG Z Q, HE J Y, et al. ProContEXT: Exploring Progressive Context Transformer for Tracking // Proc of the IEEE International Conference on Acoustics, Speech and Signal Proce-ssing. Washington, USA: IEEE, 2023. DOI: 10.1109/ICASSP49357.2023.10094971.
[84] CAI Y D, LIU J, TANG J, et al. Robust Object Modeling for Vi-sual Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 9555-9566.
[85] XIE F, CHU L, LI J H, et al. VideoTrack: Learning to Track Objects via Video Transformer // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 22826-22835.
[86] WU Q Q, YANG T Y, LIU Z Q, et al. DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 14561-14571.
[87] ZHAO H J, WANG D, LU H C. Representation Learning for Vi-sual Object Tracking by Masked Appearance Transfer // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 18696-18705.
[88] GAO S Y, ZHOU C L, ZHANG J. Generalized Relation Modeling for Transformer Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 18686-18695.
[89] YANG D W, HE J F, MA Y C, et al. Foreground-Background Distribution Modeling Transformer for Visual Object Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 10083-10093.
[90] CHEN T, SAXENA S, LI L L, et al. Pix2Seq: A Language Mo-deling Framework for Object Detection[C/OL].[2025-11-10]. https://openreview.net/pdf?id=e42KbIw6Wb.
[91] CHEN X, PENG H W, WANG D, et al. SeqTrack: Sequence to Sequence Learning for Visual Object Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 14572-14581.
[92] WEI X, BAI Y F, ZHENG Y C, et al. Autoregressive Visual Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 9697-9706.
[93] BAI Y F, ZHAO Z Y, GONG Y H, et al. ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2024: 19048-19057.
[94] SHI J Z, YU Y, HUI B, et al. Historical States Modeling for Vi-sual Tracking. Neural Computing and Applications, 2025, 37(7): 5831-5848.
[95] LIN L T, FAN H, ZHANG Z P, et al. Tracking Meets Lora: Faster Training, Larger Model, Stronger Performance // Proc of the 18th European Conference on Computer Vision. Berlin, Germany: Springer, 2024: 300-318.
[96] CAI W R, LIU Q J, WANG Y H. HIPTrack: Visual Tracking with Historical Prompts // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 19258-19267.
[97] SHI L T, ZHONG B N, LIANG Q H, et al. Explicit Visual Prompts for Visual Object Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(5): 4838-4846.
[98] ZHENG Y Z, ZHONG B N, LIANG Q H, et al. ODTrack: Online Dense Temporal Token Learning for Visual Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(7): 7588-7596.
[99] XIE J X, ZHONG B N, MO Z Y, et al. Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 19300-19309.
[100] LI S W, YANG Y X, ZENG D, et al. Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 13989-14000.
[101] KOU Y T, GAO J, LI B, et al. ZoomTrack: Target-Aware Non-Uniform Resizing for Efficient Visual Tracking // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 50959-50977.
[102] YANG X Y, ZENG D, WANG X C, et al. Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking. Pattern Recognition, 2025, 161. DOI: 10.1016/j.patcog.2024.111278.
[103] LI Y X, LIU M Y, WU Y, et al. Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking // Proc of the 41st International Conference on Machine Learning. New York, USA: ACM, 2024: 28403-28420.
[104] WU Y, WANG X C, ZENG D, et al. Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking[C/OL].[2025-11-10]. https://arxiv.org/pdf/2407.05383.
[105] ZHU J W, CHEN X, DIAO H W, et al. Exploring Dynamic Transformer for Efficient Object Tracking. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(8): 15502-15514.
[106] XUE C C, ZHONG B N, LIANG Q H, et al. Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Re-cognition. Washington, USA: IEEE, 2025: 6730-6740.
[107] ZHU J W, TANG H Y, CHEN X, et al. Two-Stream Beats One-Stream: Asymmetric Siamese Network for Efficient Visual Trac-king. Proceedings of the AAAI Conference on Artificial Intelligence, 2025, 39(10): 10959-10967.
[108] CUI Y T, SONG T H, WU G S, et al. MixFormerV2: Efficient Fully Transformer Tracking // Proc of the 37th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2023: 58736-58751.
[109] WU Y, LI Y X, LIU M Y, et al. Learning an Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking. IEEE Transactions on Circuits and Systems for Video Technology, 2026, 36(2): 2403-2418.
[110] HONG L Y, LI J L, ZHOU X Y, et al. General Compression Framework for Efficient Transformer Object Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2025: 13427-13437.
[111] DONG S H, FENG Y H, YANG Q, et al. LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking[C/OL].[2025-11-10]. https://arxiv.org/pdf/2405.17660.
[112] LI S W, YANG X Y, WANG X C, et al. Learning Target-Aware Vision Transformers for Real-Time UAV Tracking. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62. DOI: 10.1109/TGRS.2024.3417400.
[113] WU Y, WANG X C, YANG X Y, et al. Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking // Proc of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2025: 17103-17113.
[114] CHEN X, KANG B, WANG D, et al. Efficient Visual Tracking via Hierarchical Cross-Attention Transformer // Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 461-477.
[115] GOPAL G Y, AMER M A. Mobile Vision Transformer-Based Vi-sual Object Tracking[C/OL]. [2025-11-10]. https://papers.bmvc2023.org/0800.pdf.
[116] WEI Q M, ZENG B, LIU J Q, et al. LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking // Proc of the IEEE International Confe-rence on Robotics and Automation. Washington, USA: IEEE, 2024: 4968-4975.
[117] KANG B, CHEN X, WANG D, et al. Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking // Proc of the IEEE/CVF International Conference on Computer Vision. Washington, USA: IEEE, 2023: 9578-9587.
[118] BLATTER P, KANAKIS M, DANELLJAN M, et al. Efficient Visual Tracking with Exemplar Transformers // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2023: 1571-1581.
[119] GOPAL G Y, AMER M A. Separable Self and Mixed Attention Transformers for Efficient Object Tracking // Proc of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington, USA: IEEE, 2024: 6694-6703.
[120] WANG S L, CHENG G, LAI P J, et al. Multi-state Tracker: Enhancing Efficient Object Tracking via Multi-state Specialization and Interaction // Proc of the 33rd ACM International Conference on Multimedia. New York, USA: ACM, 2025: 4087-4096.
[121] ZONG C G, CHEN X, ZHAO J, et al. Enhancing the Two-Stream Framework for Efficient Visual Tracking. IEEE Transactions on Image Processing, 2025, 34: 5500-5512.
[122] GU A, DAO T. Mamba: Linear-Time Sequence Modeling with Se-lective State Spaces[C/OL]. [2025-11-10]. https://arxiv.org/pdf/2312.00752.
[123] ZHANG J M, LIANG C, CUI Y T, et al. TrackMamba: Mamba-Transformer Tracking[C/OL].[2025-11-10]. https://openreview.net/pdf?id=V7QRVEZ0le.
[124] XIE J X, ZHONG B N, LIANG Q H, et al. Robust Tracking via Mamba-Based Context-Aware Token Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2025, 39(8): 8727-8735.
[125] WANG Q W, ZHOU L Y, JIN P C, et al. TrackingMamba: Visual State Space Model for Object Tracking. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 16744-16754.
[126] ZHANG C H, LIU L, WEN H, et al. MambaTrack: Exploiting Dual-Enhancement for Night UAV Tracking // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, USA: IEEE, 2025. DOI: 10.1109/ICASSP49660.2025.10890855.
[127] KANG B, CHEN X, LAI S M, et al. Exploring Enhanced Contextual Information for Video-Level Object Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 2025, 39(4): 4194-4202.
[128] LI X H, ZHONG B N, LIANG Q H, et al. MambaLCT: Boosting Tracking via Long-Term Context State Space Model. Proceedings of the AAAI Conference on Artificial Intelligence, 2025, 39(5): 4986-4994.
[129] YU W H, WANG X C. MambaOut: Do We Really Need Mamba for Vision // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2025: 4484-4496.
[130] ZHU J W, LAI S M, CHEN X, et al. Visual Prompt Multi-modal Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 9516-9526.
[131] WU Z W, ZHENG J L, REN X X, et al. Single-Model and Any-Modality for Video Object Tracking // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Wa-shington, USA: IEEE, 2024: 19156-19166.
[132] CHEN X, KANG B, ZHU J W, et al. Unified Sequence-to-Sequence Learning for Single-and Multi-modal Visual Object Trac-king[C/OL].[2025-11-10]. https://arxiv.org/pdf/2304.14394.
[133] HONG L Y, YAN S L, ZHANG R R, et al. OneTracker: Uni-fying Visual Object Tracking with Foundation Models and Efficient Tuning // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 19079-19091.
[134] CHEN X, KANG B, GENG W T, et al. SUTrack: Towards Simple and Unified Single Object Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 2025, 39(2): 2239-2247.
[135] ZHANG H P, YUAN D, SHU X, et al. A Comprehensive Review of RGBT Tracking. IEEE Transactions on Instrumentation and Measurement, 2024, 73. DOI: 10.1109/TIM.2024.3436098.
[136] 欧洲,应舸,张大伟,等. RGB-D目标跟踪综述.计算机辅助设计与图形学学报, 2024, 36(11): 1673-1690.
(OU Z, YING G, ZHANG D W, et al. A survey of RGB-Depth Object Tracking. Journal of Computer-Aided Design & Computer Graphics, 2024, 36(11): 1673-1690.)
[137] 张大伟,王炫,何小卫,等.基于深度学习的RGBT目标跟踪研究进展.计算机工程与应用, 2025, 61(19): 43-59.
(ZHANG D W, WANG X, HE X W, et al. Research Progress of RGBT Object Tracking Based on Deep Learning. Computer Engineering and Applications, 2025, 61(19):43-59)
[138] YAN B, JIANG Y, SUN P Z, et al. Towards Grand Unification of Object Tracking // Proc of the 17th European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 733-751.
[139] YAN B, JIANG Y, WU J N, et al. Universal Instance Perception as Object Discovery and Retrieval // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2023: 15325-15336.
[140] WANG J K, WU Z X, CHEN D D, et al. OmniTracker: Uni-fying Visual Object Tracking by Tracking-with-Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(4): 3159-3174.
[141] WANG J K, CHEN D D, LUO C, et al. OmniViD: A Generative Framework for Universal Video Understanding // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2024: 18209-18220.
[142] YANG C Y, HUANG H W, CHAI W H, et al. SAMURAI: Motion-Aware Memory for Training-Free Visual Object Tracking with SAM2. IEEE Transactions on Image Processing, 2026, 35: 970-982.
[143] YANG J Y, GAO M Q, LI Z, et al. Track Anything: Segment Anything Meets Videos[C/OL].[2025-11-10]. https://arxiv.org/pdf/2304.11968.
[144] ZHU J W, CHEN Z Y, HAO Z Q, et al. Tracking Anything in High Quality[C/OL].[2025-11-10]. https://arxiv.org/pdf/2307.13974.
[145] CHENG Y M, LI L L, XU Y Y, et al. Segment and Track Anything[C/OL].[2025-11-10]. https://arxiv.org/pdf/2305.06558.
[146] VIDENOVIC J,LUKEZIC A,KRISTAN M. A Distractor-Aware Memory for Visual Object Tracking with SAM2 // Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington,USA:IEEE,2025:24255-24264.
[147] XIAO Y,ZHAO J C,LU A D,et al. Cross-Modulated Attention Transformer for RGBT Tracking. Proceedings of the AAAI Confe-rence on Artificial Intelligence,2025,39(8):8682-8690.
[148] XUE Y L,JIN G D,ZHONG B N,et al. FMTrack:Frequency- Aware Interaction and Multi-expert Fusion for RGB-T Tracking. IEEE Transactions on Circuits and Systems for Video Technology,2026,36(2):1655-1667.
[149] WU K,CHEN H,WANG C R,et al. Hierarchical Instruction-Aware Embodied Visual Tracking[C/OL]. [2025-11-10]. https://arxiv.org/pdf/2505.20710.
[150] WU K,XU S H,CHEN H,et al. VLM Can Be a Good Assistant:Enhancing Embodied Visual Tracking with Self-Improving Visual-Language Models // Proc of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Washington,USA:IEEE,2025:13154-13161. |